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Abstract 

The binary Mumford-Shah model is a widespread tool for image segmentation and can be 
considered as a basic model in shape optimization with a broad range of applications in com¬ 
puter vision, ranging from basic segmentation and labeling to object reconstruction. This paper 
presents robust a posteriori error estimates for a natural error quantity, namely the area of the 
non properly segmented region. To this end, a suitable strictly convex and non-constrained re¬ 
laxation of the originally non-convex functional is investigated and Repin’s functional approach 
for a posteriori error estimation is used to control the numerical error for the relaxed problem 
in the L 2 -norm. In combination with a suitable cut out argument, a fully practical estimate for 
the area mismatch is derived. This estimate is incorporated in an adaptive meshing strategy. 
Two different adaptive primal-dual finite element schemes, and the most frequently used finite 
difference discretization are investigated and compared. Numerical experiments show qualita¬ 
tive and quantitative properties of the estimates and demonstrate their usefulness in practical 
applications. 


1 Introduction 

Since the introduction of the image denoising and edge segmentation model by Mumford and Shah 
in the late 80’s [30), there has been much effort to find effective and efficient numerical algorithms 
to compute minimizers of different variants of this variational problem. The original model is based 
on the functional E M sIao K] = Jq\k \^ u \ 2 + a ( u ~ u o ) 2 dx + f3'H n ~ 1 (K) with a, (3 > 0, where 
u$ : Cl -> R is a scalar image intensity on the image domain Cl c R n , u the reconstructed image 
intensity and K the associated set of edges, on which the image intensity u jumps. Here, W 1-1 
denotes the (n - 1)-dimensional Hausdorff measure. The space of functions of bounded variation 
BV(Cl) turned out to be the proper space to formulate the problem in a mathematical rigorous way. 
Indeed, existence in the context of the space of special functions of bounded variation SBV(Cl) 
was proved by Ambrosio (see 13 Theorem 4.2]). For details on these spaces, we refer to 0). 
Restricting u to be piecewise constant instead of piecewise smooth, one is lead to a basic and 
widespread image segmentation model. This model is discussed from a geometric perspective 
in the book by Morel and Solimini [29]. In the case of just two intensity values c\ and C 2 , the 
associated energy can be rewritten in terms of a characteristic function xe£>U(£2,{0,l})as 

E[x,ci,c 2 ]= f #ix + # 2(1 - x) dx + |Dx|(f2) • (1.1) 

Jn 

Here, 6i = \(ci - uq) 2 for i = 1,2, the new weight v - /3/a and the resulting binary model 
is given by u = Cix + C 2 (l~x). For fixed x» one immediately obtains the optimal constants 


1 


c\ = (In X ) 1 Jq X u o dx and C 2 = (/^ 1 - x dx) 1 (1 - x)^o dx. For fixed ci and C 2 one aims 

at minimizing the energy over the non-convex set of characteristic functions x 6 W(Q,{0,1}). 
Nikolova, Esedoglu and Chan ED showed that this non-convex minimization problem can be 
solved via relaxation and thresholding—a breakthrough for both reliable and fast algorithms in 
computer vision |34l |T4]|. Here, at first one asks for a minimizer of E[-,ci,C 2 ] over all u € 
BV(Q,[ 0,1]) and then thresholds u for any threshold value s e [0,1) to obtain the solution 
X = X[u>s] of the original minimization problem. The relaxed problem coincides with a con¬ 
strained version of the classical image denoising model by Rudin, Osher and Fatemi (ROF) l39l . 
Numerical schemes for an effective and efficient minimization of this model have extensively been 
studied. Making use of a dual formulation, Chambolle m introduced an iterative finite difference 
scheme and proved its convergence. Hintermuller and Kunisch [26] proposed a predual formu¬ 
lation for a generalized ROF model and proposed to apply a semismooth Newton method for a 
regularized variant. Chambolle and Pock 1171 deduced a primal-dual algorithm with guaranteed 
first order convergence and applied their approach to different variational models in BV such as 
image denoising, deblurring and interpolation. The scheme is based on an alternating discrete gra¬ 
dient scheme for the discrete primal and the discrete dual problem. Bartels ID used the embedding 
BV(Q) n L°°(Q) ^ Hz(Q) to improve the step-size restriction for BV functionals. Wang and 
Lucier ATI employed a finite difference approximation of the ROF model and derived an a priori 
error estimate for the discrete solution based on suitable projection operators. Follo wing Dob son 
and Vogel (20), the total variation regularization can be approximated smoothly via yf\\7u | 2 + 6. In 
[23], the convergence of the L 2 -gradient flow of this smooth approximation to the TV flow in L 2 
is shown under strong regularity assumptions on the solution. 

Furthermore, approximations of the original Mumford-Shah model have been studied exten¬ 
sively. An early overview of different approximation and discretization strategies was given by 
Chambolle in do). Ambrosio and Tortorelli m proposed a phase field approximation of this func¬ 
tional and proved its T-convergence. Chambolle and Dal Maso m proposed a discrete finite 
element approximation and established its T-convergence. Bourdin and Chambolle (9) picked up 
this approach and studied the generation of adaptive meshes iteratively adapted in accordance to 
an anisotropic metric depending on the current approximate solution. In (40i Shen introduced 
a T-converging approximation of the piecewise constant Mumford-Shah segmentation, where the 
length term in the Mumford-Shah model is approximated via an approach originating from the 
phase field model by Modica and Mortola (28) . A simple and widespread level set approach was 
proposed by Chan and Vese m. 

The goal of this paper is to derive a posteriori error estimates for the binary Mumford-Shah 
model for fixed constants c\ and C 2 . To this end, we proceed as follows: We take into account a 
suitable strictly convex relaxation of the binary Mumford-Shah functional already studied in (8), 
which is related to more general relaxation approaches suggested by Chambolle m (see Section 
[2]). For this relaxation, we take into account its predual and set up a corresponding primal-dual 
algorithm (71,17, 25] (see Section [3]). Then, following Bartels (7), we use Repin’s primal-dual 
approach [ 36, 37] to derive functional a posteriori error estimates for the relaxed solution based on 
upper bounds of the duality gap (cf. also the book by Han (24l with respect to mechanical appli¬ 
cations) (see Section [4]). These estimates can be used together with a suitable cut out argument to 
derive an a posteriori estimate for the characteristic function x minimizing the original functional 
(see Section [5]). Moreover, two adaptive finite element discretization schemes and one con¬ 
ventional, non adaptive finite difference scheme are investigated (see Section [6]). Finally, we apply 
the resulting estimate to these schemes incorporating an appropriate post smoothing and present 
the numerical results (see Sections [7] and [8). 
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2 A Strictly Convex Relaxation of the Binary Mumford-Shah 
Model 


Henceforth, we use the notation xa to denote the indicator function of a measurable set A c Cl and 
define [u > c] := {x e Cl : u(x ) > c}. Furthermore, we use generic constants c and C throughout 
this paper. Rewriting the binary Mumford-Shah functional HD as 

E[x,c 1 ,c 2 ]= f (0i - # 2 )x da: + |D%|(f2) + f 0 2 dx, (2.1) 

Jn Jn 

one observes that adding a constant to 0\ and 62 leaves the minimizers x unchanged. Thus, we 
may assume that 0i, 02 > c > 0. Now, let us introduce the following relaxed functional 

E K \u ] = J' u 2 6 i + (1 - u) 2 0 2 dx + |Du|(fi), (2.2) 

which is supposed to be minimized over all u e RU(n,R). Indeed, E TQl [x\ = E[x,c 1 ^ 2 ] for 
characteristic functions x and one retrieves the original binary Mumford-Shah model. Proving ex¬ 
istence of minimizers of ( |2.2| ) with the direct method in the calculus of variations is straightforward. 
For details we refer to I4l l22l . Furthermore, ( |2.2| ) is strictly convex by our above assumptions on 
0\ and 62 • Loosely speaking, a preference for the values 0 and 1 for u is encoded in the quadratic 
growing data term. The minimizers of both functionals HD and \22) are related in the following 
sense (cf. 0): 

Proposition 2.1 (Convex relaxation and thresholding). Under the above assumptions, a minimizer 
u € BV (Cl) of the functional E rel exists, u(x) € [0,1 ]for a.e. x eft and 

X[u>o. 5 ] e argmin E[x\ • 

X zBV(n,{ 0 , 1 }) 

Proof Proposition |2.1 1 is an instance of a more general result, which can be found in [ 12, 16]. In 
fact, let : Cl x R R be measurable, 4/(*, t) e L X (U) for a.e. t e R, \h(x, •) € C 1 (R) be strictly 
convex for a.e. x e Cl, 4/(x, t ) > c\t\ - C and E^[u\ = f Q 4/(x, u ) dx+ |Du|(fl). Then, there exists 
a minimizer u e argmin,^ E^[u] and for s e R 

X S:= X[n> s ]^ argmin / d t V(x, s)xdx + |Dx|(ft) • 

X eBV(n,{ 0 , 1 }) 


With 4>(x,t) = t 2 0i(x) + (1 - t) 2 62 {x) and s = - this allows to verify the main claim of Propo- 
Indeed, for t e R and x £ BV(Cl , {0,1}), let F^ el [x] : = / n t)xdx + |Dx|(D). 


sition 


2.1 


For our specific choice of T, E r t d [x] = (2t(6i(x) + 62 (x)) - 26^(x)) x& x + |Dx|(U) implies 

that minimizing the functional F7i el is equivalent to minimizing the functional E[-, ci, C 2 ] because 


Sf[x] = -E[x,ci,c 2 ]-/ n 0 2 da:. 

For the sake of completeness, we give here the proof of the more general statement tailored 
to our particular instance. Obviously, i£ rel [min{max{0,u}, 1}] < E TQl [v] for any v € BV(Cl). 
Hence, the minimizer takes values in the range [0,1]. Due to Fubini’s Theorem, we obtain 


[ ^(x : u)dx= [ ^(x,0)dx+ f [ d t ^(x,t)X[u>t]dxdt 
JQ Jo JCl 


for u € BV(Cl , [0,1]). For seR, let x s denote a minimizer of Ef l in the set BV(Cl , {0,1}). 
Using the coarea formula (see |4] Theorem 3.40]) and the minimization property of x s we get 
E tel [u] >Cm+ p Ef [x 5 ] ds, where C* = f a <L(x, 0) dx. 

Next, we make use of the following monotonicity result J2| Lemma 4]: Let h -\, h-> e L 1 (O) 
such that h\ < /i 2 a.e. in D and assume Xi is a minimizer of f Q hixdx+\Dx\(Cl) in RU(U, {0,1}) 
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for i e {1,2}, then xi ^ X 2 a.e. in Cl. Due to the strict positivity of 6 1 and 0 2 we obtain that 
dt^(x^s) is strictly increasing in s for fixed x. Thus, choosing hi = d t ^(x,Si) we get that 
X s 1 > X s2 a - e - i n ^ for 5 i ^ 52 . This implies that u*(x) = sup {s \ x s (x) = 1} is well-defined and 
Lebesgue measurable with x s = X[u*>s]- Furthermore, by the coarea formula we get u* e BV (Cl) 
and C + /q 1 Ef\ X s ] ds = E VGl [u*] . Altogether, we obtain the chain of estimates 

E r&l [u] > C* + r E f [x s ] ds = E TGl [u*] > E TGl [u ], 

J 0 

in which the inequalities are actually equalities. Thus, for a.e. s e [0,1] the characteristic function 
X[w*>s] is a minimizer of E v f. We are left to show that this holds in particular for s = \. To 
this end, we consider a monotonously decreasing sequence s n , which converges to |. For any 
X e BV(Cl,{ 0,1}) we can infer by the dominated convergence theorem and the weak-* lower 
semicontinuity of the total variation (noting that X[u>s n ] converges weak-* in BV to X[ n >^]) 


Ef[x] = liming [X] > liminf £^[ X[u>Sn] ] > E?[ X[u> ij], 

9 n—>00 u n —>00 L J l 2 J 


from which the assertion follows (indeed the more general result holds by the same argument for 
alls e [0,1)). □ 


3 A Primal Dual Approach for the Relaxed Problem 


In this section, we make use of convex analysis to derive a duality formulation for the minimiza¬ 
tion problem of the relaxed functional ( |2.2| ). Primal and dual formulation will later be used in the a 
posteriori estimates. The dual of BV (£2) is very difficult to characterize and not suitable for com¬ 
putational purposes. Thus, for a generalized ROF model, Hintermuller and Kunisch l25l proposed 
to consider the corresponding BV functional as the dual of another functional, which we refer 
to as the predual functional. Bartels CD made use of this approach in the context of a posteriori 
estimates for the ROF model. Here, we follow this procedure and investigate the predual of ( |2.2| ). 

Recall that the Fenchel conjugate J* of a functional J : X -> H on a Banach space X with R = 
IRujoo} is a functional on the dual space X' with values in R, defined as J* \x'] = sup x€X {(a/, x)- 
J\x]}, where (*,*) denotes the duality pairing. Furthermore, we denote by A* e £(Y',X') the 
adjoint operator of A £ £(X, Y) and by dJ the subgradient of J (cf. l2ll ). 

Now, we investigate an energy functional 

D Kl [q]=F[q]+G[Aq] qtQ, (3.1) 


where F : Q -> R and G : V -> R being proper, convex and lower semicontinuous functionals, V 
and Q being reflexive Banach spaces and A g £(Q,V). In our case of the predual of the convex 
relaxed binary Mumford-Shah model, we have 


F[q\ = %,[<?] = 


0 if \q\ < 1 a.e. 
+oo else 



\v 2 + v0 2 - 0i0 2 

0i + 0 2 


dx , 


with A = div, Q = FfAr(div, Q) and V = L 2 (Cl). Recall the definition of the spaces JT(div, Cl) = 
{q e L 2 (D,M n ) : divg e L 2 (D)}, endowed with the norm ||g|#( div ^ = |g||| 2 ( n ) + || divg||| 2 ^^, 
and FfAr(div, Cl) = H{ div, Cl) n {q • v - 0 on <9£2}, where v is the outer normal on dCl and the 
operator div is understood in the weak sense. Moreover, A* = -V holds in the sense 


(A*u, q) L 2 (n) = (v, div q) L 2 (n) Vu € V, qeQ. 

Based on this duality and for the particular choice of D rel , we easily verify that (D re1 )* = £2 el . 
Indeed, from the general theory in [21] pp. 58 ff.], we can deduce (D rd )*[v] = F*[-A*u] +G*[v]. 
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As a result of the denseness of C^iCl) in (div, Cl) with respect to the norm || • ||#(di v ,op we 
can infer for any v e BV(Cl) 


|Dv|(n) 


sup 

qeQ,\\q\\oo<l 


/ vdivqdx 

Jn 


= sup 
qzQ 


J v div qdx - 



which leads to 

F*[-A*v] = sup f- f vdivqdx - [q] j = |Dv|(fi). 

q&Q \ / 

On the other hand, the Fenchel conjugate of G can be computed as follows: 

G*[v]= sup ((v,w) L 2 {q) -G[w]) = f v 2 0 1 + (1 - v) 2 6 2 dx, 
weL 2 (n) Jn 


where the supremum is attained for w = 2v(9i + O 2 ) - 2# 2 - This verifies the assertion. 

Now, the central insight is that 

D Kl [p] = -(D Kl y[u] (3.2) 

for a minimizer p of D rel and a minimizer u of (£) re1 )*. This can be seen by formally exchanging 
inf and sup as follows 

D rQl [p] = inf (F[q] + G[Aq]) = inf sup (F[q] + (v, Aq) - G*[v]) 

q€H N ( div,ft) qeH N (div,Q) ve L 2 (Q) 

= sup I - sup ((-A *v,q) - F[q]) - G*[v]\= sup (-F*[-A*i;] - G*[v]) 

v€L 2 (Q)\q€H N (div,ft) ) v&L 2 (Q) 

= sup (-(Z? reI )*[u]) = - inf (D* l y[v] = -(D^y[u]. 

veL 2 (n) veL 2 (n) 

A rigorous verification can be found in (2U Chapter III.4] (see also [38, 13611711). Furthermore, one 
obtains that q e Q and v e V are optimal if and only if -A*u € dF[q\ and v e dG[Aq\, which can 
be deduced from the equivalence J[x] + J*[x'] = (x',x) x' e dJ\x] (see [21, Proposition 
1.5.1]). 


4 Functional A Posteriori Estimates for the Relaxed Problem 

In what follows, we investigate a posteriori error estimates associated with the energy D VGl \q] = 
F[q ] + G[Aq\ and its dual E Tel [v] = F*[-A*i;] + G*[v]. A crucial prerequisite is the uniform 
convexity of G , which is linked to the specific choice of the relaxed Model E VQl . 

Recall that a functional J : X -> IR is uniformly convex , if there exists a continuous functional 
<Fj : X -+ [0, 00 ) such that j[ - 1+ 2 X2 ] + &j(x 2 - x\) < |( J[xi] + J\x 2 ]) for all x\,x<i e X and 
j(x) = 0 if and only if x = 0. Furthermore, we denote by a non-negative functional such that 
(x\x 2 ~xi) + d/ j(x 2 ~xi) < J\x 2 ]~ J\xi] for all x' e dJ\x 1 ]. Hence, allows a quantification 
of the strict monotonicity of J. If J e C 2 and A min denotes the smallest eigenvalue of D 2 J[0], 
then <!>./ and ^ j admit the representation 

M®) = A mi „(Z? 2 J[0])||x|| 2 and = L mi „(Z? 2 J[0])||x|| 2 , 

o Z 

which follows readily via a Taylor expansion. 

Now, the a posteriori error estimate is based on the following direct application of a general 
result by Repin f36l : Let u e argmin^y E rel [v] and qeQ, veV' = V = L 2 (Cl). Then, 

(u - u) + (A*(u - O) + < i(E^[v] + D rel [q ]) . (4.1) 
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The proof of ( |4.1| ) relies on the following two estimates: At first, 
{u ~ v) + <&f* (A *(u ~v)) 

< 1 (F* [A*v] + G* [v] + F* [A*u] + G* [u]) - (f* 


A : 


.u + v 


+ G* 


u + v 


]) 


which is a consequence of the uniform convexity. Secondly, using the monotonicity and that u is a 
minimizer of E and thus 0 € dE rd \u], we get 




u + v 


+ G* 


u + v 


- (F* [A*iz] + G* [u ]). 


The claim follows by adding both estimates and using the fundamental relation E rd [u] > -D rd [q] 
known as the weak complementarity principle (cf. |2lJl36l[7]]). In the case of the binary Mumford- 
Shah model, we easily compute 


§ F * = 0 , j f v 2 (0 1 + 0 2 )dx, ^ E rd(v) = f v 2 {0 1 + 6 2 )dx 

and the estimate ( |4.1| ) implies for any v e V and q e Q 

JT(« - vf{e 1 + e 2 ) dx < e k '[v] + D K \q]. 


Finally, \{a- b) 2 < a 2 + b 2 with a = c\ - uq and b = c 2 - uq yields A_(ci - c 2 ) 2 <6 i + 0 2 . Thus, 
we obtain the following theorem: 

Theorem 4.1. Let u eV be the minimizer of E rel . Then, for any v e V and q e Q it holds that 


(4.2) 


In the application, one asks for (post processed) discrete primal v and dual solution q which 
ensure a small right hand side. Additionally, the estimator err u is consistent , i.e. err u [v,q] -> 
0 provided v and q converge to the extrema of the corresponding energy functionals w.r.t. the 
topology of the associated Banach spaces. 


5 A Posteriori Error Estimates for the Binary Mumford-Shah 
Model 

In the sequel, we expand the a posteriori theory to the binary Mumford-Shah model. The key 
observation is that for many images approximate solutions u e L 2 ( ft) of the relaxed model are 
characterized by steep profiles, where the actual solution of the original binary Mumford-Shah 
model jumps. Thus, we proceed as follows. We define 

- ||x[|-t/<v<i+ 77]|| l1 ^ 

for p e (0, |), which measures the area of the preimage of the interval of size 2rj centered 
at the threshold value s = | (cf. Section [5]). Based on the above observation, the set S' v = 
-rj <v < 4 + 77 ] can be regarded as the set of non properly identified phase. Taking into ac¬ 
count this definition, we obtain the following theorem. 

Theorem 5.1 (A posteriori error estimator for the binary Mumford-Shah model). For fixed c\ and 
c 2 let x e BV ($2, {0,1}) be a minimizer of the binary Mumford-Shah functional E\^ ci, c 2 \ ClU- 
Then for all v e V = L 2 (Q) and q e Q = Hjy (div, fl) we have that 

|x-x [v >i]|| < err x [v,g] : = inf U[v,v] + fevi 2 u [v,q]\ . (5.1) 

V ) 2 1 ' I 
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Let us remark that X\ v >±] is the result of the same thresholding, which relates x to the solution 
u of the relaxed problem ( |2 2| ), i.e. x - X[ u >\]> this time applied to v. 

Proof. Recall that any minimizer u of the E rd fulfills 0 < u < 1. For all 77 e (0, |) we obtain the 
following set relation for the symmetric difference of the sets [u > |] and [v > |] (A denoting the 
symmetric difference of two sets): 

[u > |] A [v > |] c c Q | ^ - r] < v(x) < \ + 77 } u [x e £2 | \u - v(x)\ > 77 } . 

Now, using Theorem [47T| the Lebesgue measure of the rightmost set can be estimated as follows 

C n (\u -v\>T)) < f p da: < 2 e rr 2 u [v,q\, 

J{\u-v\>r]} T] z T] z 

where 77 e (0, |). Finally, taking the infimum for all 77 e (0, |) concludes the proof. □ 

In the application, the computational cost to find the optimal 77 is of the order of the degrees 
of freedom for the discrete solution and thus affordable. Let us emphasize that the error estimator 
err x is not tailored to a specific finite element approach. Indeed, we can project any primal and 
dual solution onto the spaces V = L 2 (£i) and Q - div, £2), respectively. We will exploit this 
in the next sections. 


6 Finite Element and Finite Difference Discretization 

In this section, we investigate different numerical approximation schemes for the primal and the 
dual solution of the relaxed problem \22) on adaptive meshes and the refinement of the meshes 
based on the a posteriori error estimate in Theorem |5.1| In the context of image processing ap¬ 
plications with input images usually given on a regular rectangular mesh, an adaptive quadtree for 
77 = 2 (or octree for n = 3) turned out to be an effective choice for an adaptive mesh data structure. 
In what follows, we pick up the finite element approach for a variational problem on BV proposed 
by Bartels (6) and a simplified version of the latter. Furthermore, we consider the widespread finite 
different scheme proposed by Chambolle ifTTTl . 


(FE) Finite element scheme on an induced adaptive triangular grid. We consider £2 = [0, l] 2 

in all numerical experiments in this paper. On this domain, we consider an adaptive mesh Mh 
described by a quadtree with cells ^ e Mh being squares, which are recursively refined into four 
squares via an edge bisection. We suppose that the level of refinement between cells at edges 
differs at most by one. Thus, on a single edge at most one hanging node appears. Let h indicate the 
spatially varying mesh size function on (2, i.e. in the range of an initial mesh size 2 -Linit and a finest 
mesh size 2 _L ° (usually determined by the image resolution). For all discretization approaches 
investigated here, the degrees of freedom are associated with the non hanging nodes. Let us denote 
by N v the number of these nodes, which will coincide with the number of degrees of freedom 
of discrete primal functions. The finite element discretization is based on a triangular mesh Sh 
spread over the adaptive quadtree mesh via a splitting of each quadratic leaf cell into simplices & 
( “cross subdivision”). We ask for discrete primal functions Uh in the space of piecewise affine 
and globally continuous functions on Sh denoted by Vh- Thus, for functions Vh £ Vh the values 
at hanging nodes are interpolated based on the values at adjacent non hanging nodes, which are 
associated with the actual degrees of freedom. By g : qh • v - 0 on <9£2} we denote 

the discrete counterpart of Q. To accommodate this boundary condition, the boundary nodes are 
modified after each update of the dual solution in a post-processing step. On Vh , we define discrete 
counterparts of the continuous functionals F and G as follows: 


Gh 



\vl+v h 0 2 ,h-0i,h02,h , 

- ----- ax 

@l,h + $2 ,h 


Fh \qh] — Ibi [^] 5 


7 




where 0^ = ^ (c* - ?xo) 2 ) for i = 1,2 with denoting the Lagrange interpolation. In 

the application on images, we suppose that € Vo, where Vo is the simplicial finite element space 
corresponding to the full resolution image on the finest grid level L 0 representing the full image 
resolution. Furthermore, we consider two different scalar products. On Vh, we take into account 
the L 2 - product and on Qh the lumped mass product ( qh,Ph ) ^ fn^h(QhPh) dx and identify Vh 
and Qh with their dual spaces with respect to the L 2 - and the lumped mass product, respectively. 
Then, the associated dual operators are 

G* h [v h }= [ vl0i ih + (l- v h ) 2 02,hdx, F£[q h ]= f l h (\q h \)dx. 

Finally, we define the discrete divergence : Qh -> Vh, Qh ^ Vh div^, where Vh denotes the 
L 2 -projection Vh : L 2 (Q) -+ Vh • Following Bartels Q and taking into account the above scalar 
products on Vh and on Qh, we obtain for the discrete gradient -A£ : Vh Qh, v h ^ -A * h Vh, the 
defining duality 

f F h (-A-* h v h -q h ) da;= j v h V h div q h dx ( 6 . 1 ) 

J J 

for all q h e Q h and v h 6 V/, . 


(FE’) Finite element scheme based on a simple gradient operator. Instead of the above defined 
discrete gradient operator -A£, we alternatively consider the piecewise constant gradient Vvh on 
the simplices 3F of the simplicial mesh for functions Vh e Vh- To this end, we choose Qh as the 
space of piecewise constant functions on the simplicial mesh, and take into account the standard 
L 2 -product on both spaces. The above definitions of the functionals Gh and Fh are still valid. 
Moreover, G £ remains the same, only F £ changes to F £ [qh] = f n \qh\dx. The discrete divergence 
A/i : Qh Vh is defined via duality starting from the preset discrete gradient as 

/ l h (A h qhV h ) dx = - / q h -Vv h dx, 

Jn Jn 

which indeed ensures that -A^u^ = Vu^. This simplified ansatz leads to a non-conforming iterative 
solution scheme (see Section |7j, since the space of piecewise constant finite elements is not con¬ 
tained in Hjsf(d iv, Q) (cf. 0). After each modification of the (piecewise constant) dual solution 
the values on the corresponding boundary cells are set to 0 to satisfy the boundary condition. To 
apply the derived a posteriori error estimates a projection onto the space FAv(div, Q) is required. 
To this end, we replace the solution ph e Qh by its L 2 -projection onto the space 


(FD) Finite difference scheme on a regular mesh. The finite difference scheme for the nu¬ 
merical solution of functionals on BV proposed by Chambolle CD is extensively used in many 
computer vision applications and applies to image data defined on a structured non adaptive mesh. 
We compare the a posteriori error estimator for this scheme on non adaptive meshes with the 
above finite element schemes on adaptive meshes. To this end, we denote by V/j g M, Nv and 
Q h £ R 27V ' t; nodal vectors on the regular lattice for primal and dual solutions, respectively. Here, 
N v = (/T 1 +1) 2 , where h denotes the fixed grid size of the finite difference lattice. Integration is re¬ 
placed by summation and we obtain the following discrete analogues Gh and of the continuous 
functionals F and G as functions on M> Nv and R 27Vv , respectively: 


G h [V h ] := £ 


MKn) 2+ n 0 L-©L©2,,i' 




W 1 ,h ^ w 2 ,h 


F/JQ/J ,_max^ Igi [Q/J 


with ©i ft, ©2 h denoting the pointwise evaluation of 6 \ and 62 , respectively, and lp l [QJJ = 0 for 
|Q)J < 1 and +00 otherwise. The associated dual operators for the standard Euclidean product as 
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the duality pairing are 


N v N v 

G* h [v h ] = E(vl) 2 ei, ft + (i - vl) 2 ©U, KiQh] = £ |Ql| • 

i =1 i=l 

Finally, we take into account periodic boundary conditions (by identifying degrees of freedom on 
opposite boundary segments) and use forward difference quotients to define the discrete gradient 
operator -A£ : H Nv R 2A S i.e. 


((-K)v h y = 


f \T^(id) _ \ri ' 

V h v h 


f j= 1,2 


where is the index of the neighboring node in direction of the jth coordinate vector. As 

a consequence, the matrix representing the discrete divergence operator : R 2¥v R ¥t; is just 
the negative transpose of the matrix representing the discrete gradient and thus corresponds to a 
discrete divergence based on backward difference quotients. 

To use the a posteriori error estimate in the finite difference context, we consider as a simplest 
choice the piecewise bilinear functions Uh and ph uniquely defined by the solution vectors and 
P h, respectively. The boundary condition is taken care of in exactly the same way as in the case 
(FE). 


7 Implementation based on a Primal-Dual Algorithm 

For the numerical solution of the different discrete variational problems, we use the primal-dual 
algorithm proposed by Chambolle and Pock (171 Algorithm 1], which computes both a discrete 
primal and a discrete dual solution to be used in the a posteriori error estimates. Note that we 
use El Algorithm 1] instead of El Algorithm 2] even though is uniformly convex. As we 
will see later, evaluating (Id + r<9G^) -1 requires the inversion of a matrix depending on r. In 
Algorithm 1, r is fixed and the inverse can be computed once using a Cholesky decomposition 
for the sparse, symmetric and positive-definite matrix (for details see El), while in Algorithm 2 
the decomposition of the linear system has to be performed in each iteration. Before we discuss 
this algorithm in the more conventional matrix-vector notation, let us rewrite the finite element 
approaches correspondingly. Let N v = dimV/*, (the number of non hanging nodes) and N q = 
dim Qh (for (FE) N q = 2 N v and for (FE’) N q is 2 times the number of simplices). In what 
follows, we will use uppercase letters to denote a vector of nodal values, e.g. \ l h = Vh{X l ) if 
X 1 is the ith non hanging node. The two scalar products are encoded via mass matrices. Here, 
Mh € R ¥w,¥u represents the standard L 2 -product on Vh and is given by dx 

for all Vh, Uh tVh- Furthermore, € M, NqiNq is the mass matrix associated with the space Qh. 
For the approach (FE) it is given as the lumped mass matrix with = f n Xh (ph'Qh ) dx for 

all Ph , Qh e Qh, whereas for the discretization (FE’) = J n p h 'Qh dx for allp/*, q h e Q h 

defines a classical (diagonal) mass matrix. For the matrix representations A^ and -A£ of the 
discrete divergence and the discrete gradient, respectively, we obtain the relation (cf. Q) 

Aj^M^Mh. (7.1) 

For the discretization (FD) we have A^ = Aj h . Altogether, the discrete predual energy D r h el : 
R ¥<? -> R and the discrete energy E^ eZ : R ¥v -> R are defined as follows: 

D^[Q h\ = F/ifQft] + G h [A h Q h ], E r h el [V h \ = Fj>[-Aj>V h ] + Gj>[V h ]. 

In the case of both finite element schemes G^, F^, G^ and are defined using the corresponding 
functions on the finite element spaces, e.g. G^fV/J := G* h \vh\- Now, we are in the position 
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to formulate the primal-dual algorithm. For a fixed mesh and initial data (U°,P^) e M, Nv x 
M, Nq the Algorithm [I] proposed by Chambolle and Pock [ 17], Algorithm 1] computes a sequence 
(U^, P^), which converges to the tuple (U^, P/J of the discrete primal and dual solution provided 
rcr | Aft | 2 < 1. Indeed, using inverse estimates for finite elements (see (32| for a computation of 


k = 0; 

while \\U k h +1 - U^loo >THRESHOLD do 
P^ +1 = (Id + (J 9Fh)~ 1 (P^ L - aA^\J k ); 

U k h +1 = (Id + rdGl)-\U k h + rA*P* +1 ); 

\J k+ 1 = 2 lJ k h + 1 -V k h ; 
k = k + 1; 

end 

Algorithm 1: The primal-dual algorithm used to minimize E r y l . 


the constants) the operator norm can be bounded in the case (FE’) as follows: ||A/J 2 < 48(3 + 
2\/2 )h^ in » 279.8 h^ in , where /z m * n denotes the minimal mesh size occurring in Mh- Moreover, 
to estimate the operator norm for the case (FE) we use and obtain 


|A h 


2 


( max max 

Vh € Vh,\\vh\\ L 2=l qh^QhMh || L 2 =1 

( max max 

VhtVhA v h II 1,2=1 qh^Qh,\\qh \\ L 2 =1 


f n Zh(~K v h ■ Qh) dxj 

\ ^ 

J ~ v h V h div q h da; J 


2 


< max \\Ph divq h \\ 2 L 2 (n) < max || div^||| 2(n) < 96(3 + 2V2)h^ in 
qn^Qh, v ' qn^Qh, v ' 

1 , 2=1 II 9^11 1, 2 = 1 


Finally, following [11 1 I we can estimate || A/J 2 < &h^ in for the discretization (FD). 

Suitable stopping criteria are a threshold on the primal-dual gap F r h el [\J^] + D^ eZ [P^] or on 
the maximum norm of the difference of successive solutions U^ +1 - (which we apply here). 
To compute the resolvents (Id + dFh)~ x [Qh] and (Id + 9G^) _1 [V/J we use a variational ansatz 
(for details see (38J), i.e. for the resolvent of a subdifferentiable functional J with an underlying 
scalar product (•,•) we have that 

(Id + rdJ)~ 1 [x] = argmin(x -y,x -y) + 2 rJ{y ). 

y 


The resolvent of Fh for the approaches (FE) and (FD) is given by 
dd + 

with = qh(X l ). In the case (FE’), the above evaluation is performed on each cell. For the 
discretizations (FE) and (FE’), we denote by = J^WhUhVh dx the weighted 

mass matrix for functions Uh,Vh e Vh and weight Wh^Vh- Then, the resolvent of Gh reads as 

(Id + rdGir^Vh] = (M h [l + 2 r(e 1)h + © 2,/*)])' 1 M h (V h + 2 r© 2 ,^) . 


In the case (FD), the resolvent is given by 


(Id + raGD- i ra = 


v h + atkj 2 ,h 


1 + 2 r 


+ ®2 ,h) 


for 1 < i < N v . 
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In our numerical experiments, we have chosen THRESHOLD = 1CT 7 . For the application con¬ 
sidered here, this algorithm turned out to be about 20% - 30% faster than the alternating descent 
method for the Lagrangian for instance used by Bartels Cl Algorithm A’]. 

The adaptive mesh refinement is implemented as follows. Given a mesh and initial data for 
the primal and dual solution, we run the above algorithm and compute the relaxed discrete primal- 
dual solution pair (uh,Ph)- I n case of the finite difference approach (FD), we define them as the 
multilinear interpolation on the cells ^ of the regular mesh. The corresponding discrete solution 
of the original problem ( |2.1| ) is then given as Xh = X[ Uh >±] • Based on Uh and ph , we evaluate the 
local error estimator for every cell % of the full resolution image grid as follows: 

err u,«b i u h,Ph] ■= ( Ci ~^2 ( f^u 2 h Q l + (l-u h ) 2 6 2 + \S/u h \ 

\(dWp h )' 2 +divp h 0 2 - 010-2 \ 

+ - -----da; . 

@1 + #2 / 

To this end, a higher order Gaussian quadrature is used. In fact, for (FE) and (FE’) we use a 
Gaussian quadrature of order 4 on the simplices ^ composing the cell % on the finest mesh 
with full image resolution, where the 0i (i = 1,2) are originally defined, and for (FD) a Gaussian 
quadrature of order 5 directly on the cells The resulting local error estimator for a cell ^ e Mh 
and the global estimator are given as 

^i x [u h ,p h ]= £ err l Xo [u h ,p h \ and err l[u h ,p h ] = £ err l x [u h ,p h ], 

respectively. We mark those cells ^ for refinement for which 

err u,<«? [Uh,Ph] > a max err l <#,[u h ,p h \, 

where a is a fixed threshold in (0,1). Since this method is prone to outliers, we additionally sort 
all local estimators err^ ^ according to their size (starting with the smallest) and mark the cells 
in the upper decile for refinement as well. For the input data from Figure [T] we refine up to the 
resolution of the initial image. 

8 Numerical Results 

In what follows, we show numerical results for four different input images shown in Figure[l] Prior 
to executing Algorithm [T] we choose suitable values for c\ and c 2 by applying Lloyd’s Algorithm 
(see [27|) for the computation of a 2-means clustering (with initial values 0 and 1). The resulting 
values are given in Figure [T] together with the parameters values for v. The pixels of the input 
images are interpreted as nodal values of the function uo on a uniform mesh with mesh size h = 
2 _l ° (Lo = 11 (( a )? (6), (c)), 9 (d)). The algorithm is then started on a uniform mesh of mesh 
size h = 2 _Linit (L init = 5 ((a), (6), (c)), 3 (d )). In all computations we use a = 0.2, r = 10 -5 and 
a = 5 • 10 -5 . We perform 10 cycles of the adaptive algorithm and refine cells until the depth Lq of 
the input image is reached. 

Applying the primal-dual algorithm, we observe local oscillations for both finite element ap¬ 
proaches (FE) and (FE’), which deteriorate the result of the a posteriori estimator (cf. the numerical 
results in Q). Thus, in a post-processing step, we compensate these oscillations prior to the eval¬ 
uation of the estimator by an application of a smoothing filter. The filter is defined via an implicit 
time step of the discrete heat equation using affine finite elements on the underlying adaptive mesh, 
i.e. we apply the operator (M^ + ^S/ l ) _1 M/ l to the solutions, where S/i denotes the stiffness ma¬ 
trix. For the discretization (FE), we choose l = c • where h m i n denotes the minimal mesh 


11 





Figure 1: Input images together with the corresponding image resolution and the model parameters 
ci, C 2 , and v (flower image: photo by Derek Ramsey, Chanticleer Garden, cameraman image: 
copyright by Massachusetts Institute of Technology). 


size of the current adaptive grid, with c = 3 and c = 6 for the primal and the dual solution, re¬ 
spectively. Moreover, in the case (FE’) the smoothing is only applied to the dual solutions with 
parameter i - 0.75 • h 9 ' 9 , where h a denotes the average cell size on the adaptive mesh. In our 
experiments we observed that these smoothing methods and parameters outperformed other tested 
choices for the corresponding discretizations. We call the resulting postprocessed functions Uh and 
ph, respectively, and replace the local error estimator by err^ <g[uh,ph\- 



(a) 

(b) 

(C) 

(d) 

(FE) 

0.022429 

0.078497 

0.124740 

0.204202 

(ci X )2 £M (FE’) 

0.022256 

0.078012 

0.122620 

0.202534 

(FD) 

0.022493 

0.078814 

0.122777 

0.206166 

(FE) 

-0.021735 

-0.075977 

-0.117259 

-0.182598 

< FE ’) 

-0.020950 

-0.070986 

-0.110463 

-0.165980 

(FD) 

-0.021520 

-0.075455 

-0.119865 

-0.182300 

(FE) 

0.000694 

0.002521 

0.007481 

0.021603 

errj; (FE’) 

0.001306 

0.007025 

0.012156 

0.036554 

(FD) 

0.000973 

0.003359 

0.002912 

0.023866 

(FE) 

0.39 

0.3275 

0.2825 

0.305 

‘Hoptimal (FE ) 

0.45 

0.3675 

0.29 

0.3925 

(FD) 

0.4375 

0.345 

0.24 

0.305 

(FE) 

0.008904 

0.038779 

0.175125 

0.393884 

err x (FE’) 

0.009952 

0.071348 

0.236609 

0.65235 

(FD) 

0.008223 

0.0425225 

0.109039 

0.413503 


Table 1: Rescaled dual and primal energy evaluated on the discrete solution (uh,Ph), error estima¬ 
tor for the relaxed solution, optimal threshold 77 optimal computed for Uh and resulting a posteriori 
estimator err x for the L 1 -error of the characteristic function x (after 10 cycles of the adaptive 
algorithm). 


Table [I] lists (scaled) primal and dual energies, err^, r] opt imai (the 77 value corresponding to 
the optimal a posteriori error bound for given err^), and err x for all input images after the 10 th 
refinement step of the adaptive algorithm. The value of err x peaks for the application (d) due to 
the relatively low image resolution. 

Figure [ 2 ] plots the error estimator err^ after each refinement step for all input images and both 
finite element discretizations. In most numerical experiments, the scheme (FE) performs slightly 
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Figure 2 : The values of err^ and err x are displayed in relation to the number of degrees of freedom 
in a log-log plot for the applications (a) (upper left), (b) (upper right), (c) (lower left) and (d) (lower 
right). The plotted error estimator values correspond to the discretizations (FE) (err^ (black line, 
■) and err x (brown line, •)) and (FE’) (err^ (red line, A) and err x (blue line, ▼)), respectively. 


better than the scheme (FE’). For the flower image, the sequence of adaptive meshes and solutions 
resulting from the adaptive algorithm for the discretization (FE’) is depicted in Figure [3] Figure [4] 
displays solutions for the discretization (FE’) and the corresponding adaptive meshes together with 
color coded deciles of , and the graphs of rj afU^, 77] and 77 h* err x . Note that the displayed 
deciles explicitly indicate the sets S v for 77 = 0.1, 0.2, 0.3, 0.4. Moreover, Figures [5] and [ 6 ] show 
the relaxed solution Uh and the thresholded solution Xh for the remaining discretization schemes 
and input images. 

Finally, we applied the above methods to an analytic function consisting of a weighted sum 
of two Gaussian kernels. To this end, in each step the functionals and the error estimator are 
evaluated on the current adaptive grid and not on a prefixed full resolution grid. The results are 
shown in Figure [ 7 ] (with parameters c\ - 0.495349, C 2 = 0.056845 and v = 5 • 1CT 3 ). 


9 Conclusions 

We have investigated the a posteriori error estimation for the binary Mumford-Shah model and 
applied this estimate to two different adaptive finite element discretizations in comparison to a 
nonadaptive finite difference scheme on a regular grid. The proposed finite element discretizations 
in combination with the adaptive meshing strategy lead to a substantial reduction of the required 
degrees of freedom with error values err^ and err x of about the same magnitude as for a standard 
finite difference scheme. To improve the resulting estimate of the duality gap E TQl [v] + D rel \q], 
the finite element schemes (FE) and (FE’) require some oscillation damping smoothing in a post¬ 
processing step. 
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Figure 3: The sequence of solutions Uh and a color coding of the corresponding fineness of the 
adaptive meshes at the 1 st , 2 nd , 3 rd , 4 th and 5 th iteration of the adaptive algorithm applied to the 
input image (c) and computed using the (FE’) discretization. 


The proposed approach to a posteriori estimates for the binary Mumford-Shah model derived 
in this paper can be applied straightforwardly to more general problems in computer vision. In 
fact, the calibration method developed by Alberti, Bouchitte and Dal Maso (U provides a convex 
relaxation of non-convex functionals of Mumford-Shah type via the lifting of a variational problem 
on a n-dimensional domain to a minimization problem over characteristic functions of subgraphs in 
n + 1 dimensions. In the context of non-convex functionals in vision, this approach was studied by 
Pock et al. (33]|34). Applications of such functionals include the computation of minimal partitions 
(H] [35], the depth map identification from stereo images or the robust extraction of optimal flow 
fields (34l . Here, an adaptive mesh strategy is expected to have an even larger pay-off due to the 
increased dimension. 
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